home *** CD-ROM | disk | FTP | other *** search
-
- ~4Dgifts/toolbox/src/exampleCode/speech README
-
- new expanding subtree containing software for speech recognition
-
- See also the speech Frequently Asked Questions file
- ~4Dgifts/toolbox/FAQs/netfaqs/speech-faq
-
- `!' indicates new or updated as of version 4.2
-
-
- The capabilities of speech recognition are discrete-utterance,
- speaker-independent, and small vocabulary.
-
-
-
- examples: contains [so far] rudimentary speech example programs:
- * colors.c: speech demo opens large X window and changes
- colors when the color name is spoken,
- * recognize: c and c++ versions of the same program that
-
-
- ! inst: contains beta-level inst images of both the execution end
- development environments for speech recognition;
-
-
- lackey: a speech recognition application example, lackey recognizes
- speech through the use of the speech recognition library,
- and uses speech to launch desktop applications;
-
-
- utilities: [so far] contains three binaries--fmbg, gotoWindow, and
- xrset--useful to srpanel.
-
-
-
-
- Read this if you are interested in trying speech recognition
- (skip near the end for a simple but uninformative example):
-
- A version of the speech execution (speech_eoe) and development
- environment (speech_dev) is available in the inst subdirectory.
- Indigo (or later) audio capability and Irix 5 is required. This
- software can match discrete utterances from any speaker against a
- small pretrained vocabulary. No extra hardware is required (but a
- better microphone usually helps). A constant 10% of an R4K is used.
-
- Currently the only application understanding speech is Showcase.
- Other apps may be faked into responding to speech by having the
- speech manager send keystrokes to that app in response to speech.
- Currently this has been done for only MediaMail/Zmail, Zip/Jot,
- and 4Dwm. Several others are being experimented with including
- CASE, the Icon Catalog, other desktop entities, and Jot's electric
- C mode. Users may add their own actions and words to applications.
-
- This software is somewhere between alpha and beta stages, needing
- at least the following major improvements:
-
- * a real character for visual feedback
- * a complete set of trained vocabularies
- * integration with more apps (like the desktop)
- * UI improvements (operations on app word groups)
- * performance improvements
- * bug fixes
- * finished documentation
- * removal of debug output
- * placement in the toolchest or icon catalog
- * a way to deal with audio interference from the computer
-
- After installing speech_eoe, you must reboot before srpanel (the
- speech manager) can run. If you do not do this, srpanel will
- generate the error message "srpanel: could not connect to server".
- Make sure your microphone is plugged in and placed somewhere away
- from your noisy computer (do NOT hold the mic as your breath and
- hands cause alot of noise). Confirm an increase in apanel's level
- meter when speaking. Verify the mic is selected as input at 8KHz
- and set the gain around 7 (this varies between Indigo's & Indy's).
- See the man pages speech, srpanel, speechbeta, and showspeech
- (although they are in need of an update). See the troubleshooting
- section of srpanel's help.
-
- After launching srpanel, verify it is hearing you correctly by
- speaking "go to sleep" and "wake up" and observing srpanel's change
- in state (when sleeping, srpanel will only recognize "wake up").
- When srpanel has focus, all trained words are active but no actions
- are taken. With focus on Srpanel, verify srpanel recognizes "yes"
- and "no". If any of "go to sleep", "wake up", "yes" and "no" are
- not correctly recognized, train them using srpanel's customization
- window (select the word and click the train button).
-
- Speech-aware showcase is invoked with the command showspeech
- (installed with speech_eoe.sw.misc). Showcase must already be
- installed. The vocabulary for showspeech is modal, so see the
- vocabulary section of showspeech's man page to understand what
- showspeech is expecting to hear. Showspeech is not an approved
- version of showcase, so don't report any bugs against it to the
- showcase group.
-
- Other apps such as 4Dwm and MediaMail respond to speech on behalf of
- the speech manager's recognition of a word and subsequent keystroke
- synthesis (speech-enabled versus speech-aware). Because only
- keystrokes are communicated to the speech-enabled application,
- actions in response to speech are limited. You may add your own
- word-actions to srpanel's customization window, or use the "add from
- file" menu item to bring in predefined word-actions for some
- applications. Use MediaMail instead of Zmail (unless you use
- "zmail" to invoke it) - same for Jot/Zip. See the bindings in the
- customization window for an understanding of what can be spoken when
- (the current vocabulary is determined by the class name of the
- window which has focus).
-
- Srpanel may be instructed to respond to speech in various ways.
- Some keys have multicharacter or symbolic names and are specified
- inside chevrons such as <escape>. Modifiers such as <alt> are
- released after a subsequent non-modifier. Key presses and releases
- may also be controlled. A delay event <delay> may be needed.
- Srpanel may respond to speech with actions other than keystrokes,
- such as button presses <B#> and shell commands <!shell command>.
- Using the shell command feature, there are ways to further
- manipulate the desktop such as switching desks, warping the pointer,
- and launching applications. See binaries in the inst location.
-
- Only some of the words have been pretrained (none of the words for
- CASE), so more training *is* necessary.
-
- Most the words for 4Dwm's predefined actions have been pretrained,
- along with a portion for MediaMail/Zmail and Jot/Zip, and only a
- few for CASE, so further training by the user is currently required
- to use even the predefined action bindings.
-
- Simple but uninformative example for some 4Dwm functionality:
-
- as root:
- # inst -f inst/speech_eoe
- verify everything is selected (default) and then do
- inst> go
- inst> exit
- then reboot,
- plug in your mic and set it on your monitor,
- login as yourself and run
- % srpanel
- launch apanel from srpanel's menu "Recognizer -> Audio Control Apanel"
- verify apanel's input rate at 8KHz, source from mic, and gain at 7
- select srpanel's menu "Recognizer -> Customization"
- select "Customization's menu File -> Add From File"
- select "4Dwm" from the file browser
- place focus on any window (except any of srpanel's windows)
- say "raise window" or "lower window" and verify appropriate response
- train "yes", "no", "go to sleep", and "wake up"
- train other commands as necessary
- see above for more functionality
-
- An API document (showcase, no dev man pages yet) is part of
- speech_dev.sw.misc and installs in
- /usr/share/data/speech/misc/recog.api.
-
- Speech synthesis is technically working on our machines, but we have
- no plans or deals to ship it, so it is not included or used in the
- current speech images.
-
- Email questions, problems, comments, suggestions to lpw@sgi.com
-
-
- -=+=--=+=--=+=--=+=--=+=--=+=--=+=--=+=--=+=--=+=-
- Lance Welsh Lance Welsh
- M/S 01L-875 lpw@sgi.com
- Silicon Graphics, Inc. wk: (415) 390-1860
- PO Box 7311 hm: (415) 322-7225
- Mountain View, CA 94039-7311
- -=+=--=+=--=+=--=+=--=+=--=+=--=+=--=+=--=+=--=+=-
-
-
-